Clustering Categorical Data based on Information Loss Minimization
نویسندگان
چکیده
As the size of databases continues to grow, understanding their structure gets more difficult. This, together with the lack of documentation and the unavailability of the original designers of the database adds further difficulty to the job of researchers and professionals to understand the structure of large and complex databases. At the same time, data sources are distributed over several sites and their integration introduces anomalies and often results in “dirty” databases, i.e., databases that contain erroneous or duplicate data records. Our research focuses on the application of data mining, and in particular clustering techniques, to aid the process of recovering and understanding high-level views of data sets.
منابع مشابه
ارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها
Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...
متن کاملDIVCLUS-T: A monothetic divisive hierarchical clustering method
DIVCLUS-T is a divisive hierarchical clustering algorithm based on a monothetic bipartitional approach allowing the dendrogram of the hierarchy to be read as a decision tree. It is designed for either numerical or categorical data. Like the Ward agglomerative hierarchical clustering algorithm and the k-means partitioning algorithm, it is based on the minimization of the inertia criterion. Howev...
متن کاملA Framework for Clustering Mixed Attribute Type Datasets
We propose a clustering framework that supports clustering of datasets with mixed attribute type (numerical, categorical), while minimizing information loss during clustering. Real world datasets such as medical datasets and its ontology have mixed attribute type datasets. However, most conventional clustering algorithms have been designed and applied to datasets containing only single attribut...
متن کاملA cluster ensemble method for clustering categorical data
Categorical data clustering (CDC) and cluster ensemble (CE) have long been considered as separate research and application areas. The main focus of this paper is to investigate the commonalities between these two problems and the uses of these commonalities for the creation of new clustering algorithms for categorical data based on cross-fertilization between the two disjoint research fields. M...
متن کاملA Link-Based Cluster Collection Approach Combined Contagious Cluster With For Categorical Data Clustering
Data clustering is a challenging task in data mining technique. Various clustering algorithms are developed to cluster or categorize the datasets. Many algorithms are used to cluster the categorical data. Some algorithms cannot be directly applied for clustering of categorical data. Several attempts have been made to solve the problem of clustering categorical data via cluster ensembles. But th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003